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ALTERNATING DIRECTION METHOD OF MULTIPLIERS FOR A 
CLASS OF NONCONVEX AND NONSMOOTH PROBLEMS WITH 
APPLICATIONS TO BACKGROUND/FOREGROUND 
EXTRACTION 

LEI YANG*, TING KEI PONG*, AND XIAOJUN GHEN* 

Abstract. In this paper, we study a general optimization model, which covers a large class of 
existing models for many applications in imaging sciences. To solve the resulting possibly nonconvex, 
nonsmooth and non-Lipschitz optimization problem, we adapt the alternating direction method of 
multipliers (ADMM) with a general dual step-size to solve a reformulation that contains three blocks 
of variables, and analyze its convergence. We show that for any dual step-size less than the golden 
ratio, there exists a computable threshold such that if the penalty parameter is chosen above such 
a threshold and the sequence thus generated by our ADMM is bounded, then the cluster point of 
the sequence gives a stationary point of the nonconvex optimization problem. We achieve this via a 
potential function specifically constructed for our ADMM. Moreover, we establish the global conver¬ 
gence of the whole sequence if, in addition, this special potential function is a Kurdyka-Lojasiewicz 
function. Furthermore, we present a simple strategy for initializing the algorithm to guarantee bound¬ 
edness of the sequence. Finally, we perform numerical experiments comparing our ADMM with the 
proximal alternating linearized minimization (PALM) proposed in [5] on the background/foreground 
extraction problem with real data. The numerical results show that our ADMM with a nontrivial 
dual step-size is efficient. 

Key words. Nonsmooth and nonconvex optimization; alternating direction method of multi¬ 
pliers; dual step-size; background/foreground extraction 

1. Introduction. In this paper, we consider the following optimization problem: 

min ^{L) + ^S) + ^\\D-A[B{L)+CiS)]\\l, (1.1) 


where 

• R_,_ u { 00 } are proper closed nonnegative functions, and is 
convex, while $ is possibly nonconvex, nonsmooth and non-Lipschitz; 

• A,B,C: maps and B, C are injective. 

In particular, 4'(L) and $(5) in (1.1) can be regularizers used for inducing the desired 
structures. For instance, d'(L) can be used for inducing low rank in L. One possible 
choice is ^'(L) = ||L||* (see next section for notation and definitions). Alternatively, 
one may consider dt(L) = Sq{L), where O is a compact convex set such as LI = {L G 
I ||L||oo <1, L,i=L ,2 = --- = L:n} with I > 0, OT LI = {L G I ||L|U < r} 

with r > 0; the former choice restricts L to have rank at most 1 and makes (1.1) 
nuclear-norm-free (see [30,33]). On the other hand, $(S') can be used for inducing 
sparsity. In the literature, $(S') is typically separable, i.e., taking the form 

m n 

^S) = ^i ( 1 - 2 ) 

i=i i=i 

where ()> is a nonnegative continuous function with ^(0) = 0 and /r > 0 is a regular¬ 
ization parameter. Some concrete examples of 4> include: 
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1. bridge penalty [27,28]: = \t\P for 0 < p < 1; 

2. fraction penalty [20]: (p{t) = a|t|/(l + a\t\) for a > 0; 

3. logistic penalty [39]: (j){t) = log(l + Q;|t|) for a > 0; 

4. smoothly clipped absolute deviation [16]: (t>{t) = /j*'min(l, (a — s/^)+/(a— 
1)) ds for a > 2; 

5. minimax concave penalty [49]: = /q*'( 1 — s/(a/r))+ ds for a > 0; 

6. hard thresholding penalty function [17]: (j){t) = fi — {fi— 

The bridge penalty and the logistic penalty have also been considered in [13]. Finally, 
the linear map A can be suitably chosen to model different scenarios. For example, 
A can be chosen to be the identity map for extracting L and S from a noisy data U, 
and the blurring map for a blurred data D. The linear map B can be the identity 
map or some “dictionary” that spans the data space (see, for example, [34]), and C 
can be chosen to be the identity map or the inverse of certain sparsifying transform 
(see, for example, [40]). More examples of (1.1) can be found in [8-10,13,41,47]. 

One representative application that is frequently modeled by (1.1) via a suitable 
choice of $, 'i/, A, B and C is the background/foreground extraction problem, which is 
an important problem in video processing; see [6,7] for recent surveys. In this problem, 
one attempts to separate the relatively static information called “background” and 
the moving objects called “foreground” in a video. The problem can be modeled by 

(1.1) , and such models are typically referred to as RPCA-based models. In these 

models, each image is stacked as a column of a data matrix D, the relatively static 
background is then modeled as a low rank matrix, while the moving foreground is 
modeled as sparse outliers. The data matrix D is then decomposed (approximately) 
as the sum of a low rank matrix L € modeling the background and a sparse 

matrix S G modeling the foreground. Various approximations are then used 

to induce low rank and sparsity, resulting in different RPCA-based models, most of 
which take the form of (1.1). One example is to set 4' to be the nuclear norm of L, 
i.e., the sum of singular values of L, to promote low rank in L and <I> to be the £i 
norm of S to promote sparsity in B, as in [10]. Besides convex regularizers, nonconvex 
models have also been widely studied recently and their performances are promising; 
see [13, 44] for background/foreground extraction and [4,12, 22, 38, 39, 50] for other 
problems in image processing. There are also nuclear-norm-free models that do not 
require matrix decomposition of the matrix variable L when solving them, making 
the model more practical especially when the size of matrix is large. For instance, 
in [30], the authors set $ to be the £i norm of B and 4' to be the indicator function of 
Q = {L G I = 2^.2 = ... = L:n}. A similar approach was also adopted in [33] 

with promising performances. Clearly, for nuclear-norm-free models, one can also take 
4> to be some nonconvex sparsity inducing regularizers, resulting in a special case of 

(1.1) that has not been explicitly considered in the literature before; we will consider 
these models in our numerical experiments in Section 5. The above discussion shows 
that problem (1.1) is flexible enough to cover a wide range of RPCA-based models 
for background/foreground extraction. 

Problem (1.1), though nonconvex in general, as we will show later in Section 3, 
can be reformulated into an optimization problem with three blocks of variables. This 
kind of problems containing several blocks of variables has been widely studied in the 
literature; see, for example, [30,37,41]. Hence, it is natural to adapt the algorithm used 
there, namely, the alternating direction method of multipliers (ADMM), for solving 

(1.1) . Classically, the ADMM can be applied to solving problems of the following 


ADMM FOR NONCONVEX AND NONSMOOTH PROBLEMS 


3 


form that contains 2 blocks of variables: 

min {/i(xi) + 72 ( 2 : 2 ) I ^i(xi) +^ 2 ( 2 : 2 ) = b} , (1.3) 

Xi ,tC2 

where /i and /2 are proper closed convex functions, „4i and A 2 are linear operators. 
The iterative scheme of ADMM is 

r 2 ;^+^ € Argmin{/:^(a:i,2;2,^'')} , 

Xi 

< 2:2’^^ € Argmin{/:; 3 (a:i+\2:2,2:'')} , 

X2 

[zk+l^^k_ + A 2 ix^ 2 '^^) - b), 

where r S (0, '^2^ ) is the dual step-size and Cp\s the augmented Lagrangian function 
for (1.3) defined as 

Cp{xi,X2,z) := fi{xi) + 72(2:2) - (2;, Ai(2:i) + ^2(2:2) - b) 

+ f Ml(^1)+-42(2:2)- 6 f 

with /3 > 0 being the penalty parameter. Under some mild conditions, the sequence 
{{xi,X 2 )} generated by the above ADMM can be shown to converge to an optimal 
solution of (1.3); see for example, [3,15,19,21]. However, the ADMM used in [30, 
37,41] does not have a convergence guarantee; indeed, it is shown recently in [11] 
that the ADMM, when applied to a convex optimization problem with 3 blocks of 
variables, can be divergent in general. This motivates the study of many provably 
convergent variants of the ADMM for convex problems with more than 2 blocks of 
variables; see, for example, [24,25,35,36]. Recently, Hong et al. [26] established 
the convergence of the multi-block ADMM for certain types of nonconvex problems 
whose objective is a sum of a possibly nonconvex Lipschitz differentiable function 
and a bunch of convex nonsmooth functions when the penalty parameter is chosen 
above a computable threshold. The problem they considered covers (1.1) when 
is convex, or smooth and possibly nonconvex. Later, Wang et al. [44] considered a 
more general type of nonconvex problems that contains (1.1) as a special case and 
allows some nonconvex nonsmooth functions in the objective. To solve this type of 
problems, they considered a variant of the ADMM whose subproblems are simplified 
by adding a Bregman proximal term. However, their results cannot be applied to the 
direct adaptation of the ADMM for solving (1.1). 

In this paper, following the studies in [26,44] on convergence of nonconvex ADMM 
and its variant, and the recent studies in [1,31,45], we manage to analyze the con¬ 
vergence of the ADMM applied to solving the possibly nonconvex problem (1.1). In 
addition, we would like to point out that all the aforementioned nonconvex ADMM 
have a dual step-size of r = 1. While it is known that the classical ADMM converges 
for any r € (0, ^^ 2 ^ ) for convex problems, and that empirically r Ri works 

best (see, for example, [18,19,21,36]), to our knowledge, the algorithm with a dual 
step-size r ^ 1 has never been studied in the nonconvex scenarios. Thus, we also 
study the ADMM with a general dual step-size, which will allow more flexibility in 
the design of algorithms. 

The contributions of this paper are as follows: 

1. We show that for any positive dual step-size r less than the golden ratio, 
the cluster point of the sequence generated by our ADMM gives a stationary 
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point of (1.1) if the penalty parameter is chosen above a computable threshold 
depending on r, whenever the sequence is bounded. We achieve this via a 
potential function specifically constructed for our ADMM. To the best of our 
knowledge, this is the first convergence result for the ADMM in the nonconvex 
scenario with a possibly nontrivial dual step-size (r ^ 1). This result is also 
new for the convex scenario for the multi-block ADMM. 

2. We establish global convergence of the whole sequence generated by the 
ADMM under the additional assumption that the special potential function 
is a Kurdyka-Lojasiewicz function. Following the discussions in [2, Section 4], 
one can check that this condition is satisfied for all the aforementioned (j). 

3. Furthermore, we discuss an initialization strategy to guarantee the bounded¬ 
ness of the sequence generated by the ADMM. 

We also conduct numerical experiments to evaluate the performance of our ADMM| 
by using different nonconvex regularizers and real data. Our computational results 
illustrate the efficiency of our ADMM with a nontrivial dual step-size. 

The rest of this paper is organized as follows. We present notation and prelimi¬ 
naries in Section 2. The ADMM for (1.1) is described in Section 3. We analyze the 
convergence of the method in Section 4. Numerical results are presented in Section 5, 
with some concluding remarks given in Section 6. 

2. Notation and preliminaries. In this paper, we use to denote the set 

of all m X n matrices. For a matrix X G we let Xij denote its {i,j)th entry and 

X:j denote its jth column. The number of nonzero entries in X is denoted by ||A'||o 
and the largest entry in magnitude is denoted by |l^||oo. Moreover, the Frobenius 
norm is denoted by ||A||i;’, the nuclear norm is denoted by ||W||,, which is the sum of 
singular values of A; and £i-norm and .^p-quasi-norm (0 < p < 1) are given by ||A||i := 

Eili I]”=i and ||A|lp := (X)™ i I]”=i 1^) i respectively. Furthermore, for 

two matrices X and Y of the same size, we denote their trace inner product by 
{X,Y) := X;™! Finally, for the linear map A : in (1.1), 

its adjoint is denoted by A*, while the largest (resp., smallest) eigenvalue of the linear 
map A*A is denoted by Amax (resp., Amin). The identity map is denoted by I. 

For an extended-real-valued function / : [— 00 , 00 ], we say that it is 

proper if f{X) > —00 for all X G R™^” and its domain dom/ := {X G R™^" | f(X) <| 

00 } is nonempty. For a proper function /, we use the notation F —>■ A to denote 
F —>■ A and /(F) —>■ /(A). Our basic (limiting-)subdijferential [42, Definition 8.3] of 
f at X G dom/ used in this paper, denoted by 9/(A), is defined as 


a/(A) := e R™^” : 3 A'^ 4 A and ^ D with G a/(A'=) for all fc|, 

where df{U) denotes the Frechet subdifferential of / at C/ G dom/, which is the set 
of all D G satisfying 


lim inf 
Y^U,Y^U 


f{Y)-f{U)-{D,Y-U) 

\\Y-U\\f 


> 0 . 


From the above definition, we can easily observe that 


G R'"'"’" : 3 a'' 4 A, ^ D, G df{X^)^ C df{X). (2.1) 

We also recall that when / is continuously differentiable or convex, the above subdif¬ 
ferential coincides with the classical concept of derivative or convex subdifferential of 
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/; see, for example, [42, Exercise 8.8] and [42, Proposition 8.12]. Moreover, from the 
generalized Fermat’s rule [42, Theorem 10.1], we know that ii X G is a local 

minimizer of /, then 0 € df{X). Additionally, for a function / with several groups 
of variables, we write dxf (resp., V xf) for the subdifferential (resp., derivative) of / 
with respect to the group of variables X. 

For a compact convex set C its indicator function Sn is defined by 


5n{X) = 


0 

+00 


if x e fi, 
otherwise. 


The normal cone of at the point X G fl is given by Nn{X) = dSniX). We also use 
dist(Al, n) to denote the distance from X to i.e., dist(X, fl) := infygo [[AT — T|[f, 
and Va{X) to denote the unique closest point to X in il. 

Next, we recall the Kurdyka-Lojasiewicz (KL) property, which plays an important 
role in our global convergence analysis. For notational simplicity, we use (ry > 0) 
to denote the class of concave functions (p : [0,?]) —>• 1 R+ satisfying: (1) (/^(O) = 0 ; ( 2 ) 
if is continuously differentiable on ( 6 , 77 ) and continuous at 0; (3) <p'{x) > 0 for all 
X G ( 0 , 77 ). Then the KL property can be described as follows. 

Definition 2.1 (KL property and KL function). Let f be a proper lower 
semicontinuous function. 

(i) For X G domd/ := {X G : df{X) 0}, if there exist an p G (0,+oo], 

a neighborhood V of X and a function G such that for all X G Vr\{X G 
R"*x" ; /(X) < f{X) < f{X) + 77 }, it holds that 

p'{f{X)-f{X))dist{0,df{X))>l, 

then f is said to have the Kurdyka-Lojasiewicz (KL) property at X. 

(ii) If f satisfies the KL property at each point of dovadf, then f is called a KL 
function. 

We refer the interested readers to [2] and references therein for examples of KL 
functions. We also recall the following uniformized KL property, which was established 
in [5, Lemma 6]. 

Proposition 2.2 (Uniformized KL property). Suppose that f is a proper 
lower semicontinuous function and T is a compact set. If f = f* on T for some 
constant f* and satisfies the KL property at each point of P, then there exist e > 0, 
77 > 0 and (f G ‘E.rj such that 

<^'(/W-r)dist(0,5/(X))>l 

for all X G{X G : dist(A:, P) < e} n {AT G ^ < /* + 77}. 

Before ending this section, we discuss first-order necessary conditions for (1.1). 
First, recall that (1.1) is the same as 


1 


min X{L, S) := «'(L) -f $(5) -k ^ 


\D-A[B{L)+C{S)]\\\ 


Hence, from [42, Theorem 10.1], we have 0 G diF{L,S) at any local minimizer {L,S) 
of (1.1). On the other hand, from [42, Exercise 8.8] and [42, Proposition 10.5], we see 
that 


dX{L, S) = 


d4'(L) 4- B*A* (AiBiL) + C{S)) - D) 
d<^>{S)+C*A* (AiBiL)+C{S)) - D) 
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Consequently, the first-order necessary conditions of (1.1) at the local minimizer (L, S) 
is given by: 

0 € d^{L) + B*A* (AiBiL) + CiS)) -D), 

0 e ad>(5) -b C*A* {A{B{L) -b C{S)) -D). 

In this paper, we say that {L*,S*) is a stationary point of (1.1) if {L*,S*) satishes 
(2.2) in place of (Z, S). 

3. Alternating direction method of multipliers. In this section, we present 
an ADMM for solving (1.1), which can be equivalently written as 

min ^(L) + ^S) + l\\D-A{Z)\\l 
s.t. B{L)+C{S) = Z. 

To describe the iterates of the ADMM, we first introduce the augmented Lagrangian 
function of the above optimization problem: 

Cp{L, S, Z, A) = vI/(L) + $(5) + i||D - AiZ)\\l 

- (A, B{L)+C{S) -Z) + ^\\B{L)+C{S) - Z\\l, 

where A € ]^"ixn jg Lagrangian multiplier and /3 > 0 is the penalty parameter. 
The ADMM for solving (3.1) (equivalently (1.1)) is then presented as follows: 

Algorithm 1 ADMM for solving (3.1) 

Input: Initial point (S'*’, A°), dual step-size parameter r > 0, penalty param¬ 

eter /? > 0, fc = 0 

while a termination criterion is not met, do 

Step 1. Set 

e Argmin 

L 

5'fe+l e Arg min 

S 

Z^^^ = argmin 
z 

=A^ - r/3(B(L'=+’) -b C(S'=+’) - Z'=+’) 

Step 2. Set A: := fc -b 1 
end while 
Output: (L'=,S'=) 

Comparing with the ADMM considered in [26], the above algorithm has an extra 
dual step-size parameter r > 0 in the A-update. Such a dual step-size was intro¬ 
duced in [19,21] for the classical ADMM (i.e., for convex problems with two separate 
blocks of variables), and was further studied in [18,36,43,48] for other variants of 
the ADMM. Numerically, it was also demonstrated in [43] that a larger dual step-size 
(r Ki ) results in faster convergence for the convex problems they consider. Thus, 
we adapt this dual step-size r in our algorithm above. Surprisingly, in our numerical 


Cp{L,S\Z\A^) 

(3.2a) 

Z^A'=) 

(3.2b) 


(3.2c) 
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experiments, a parameter choice of r « leads to the worst performance for our 

nonconvex problems. 

When T = 1, the above algorithm is a special case of the general algorithm 
studied in [26] when and $ are smooth functions, or convex nonsmooth functions. 
The algorithm is shown to converge when /3 is chosen above a computable threshold. 
However, their convergence result cannot be directly applied when r 1 or when $ 
is nonsmooth and nonconvex. Nevertheless, following their analysis and the related 
studies [31,44,45], the above algorithm can be shown to be convergent under suitable 
assumptions. We will present the convergence analysis in Section 4. 

Before ending this section, we further discuss the three subproblems in Algorithm 
1. First, notice that the L-update and S'-update are given by 

f G Argmin + ^\\B{L) + C{S>^) - 

I S^^+i G Argmin |$(S) + + C(S) - . 

In general, these two subproblems are not easy to solve. However, when d' and 
$ are chosen to be some common regularizers used in the literature, for example, 
4'(L) = ||A||* and $(S) = ||S||i, then these subproblems can be solved efficiently via 
the proximal gradient method. Additionally, when 4'(L) = Sn{L) with H being a 
closed convex set and B = I, the L-update can be given explicitly by 

= Vn (^-C(S'=) + Z’^ + iA'^^ , 

which can be computed efficiently if H is simple, for example, when 17 = {L G 
I^mxn I = 2^,2 = ... = L-n} for some I > 0. For the S-update, 

when <I> is given by (1.2) with </> being one of the penalty functions presented in the 
introduction and C = I, it can be solved efficiently via a simple root-finding proce¬ 
dure. Finally, from the optimality conditions of (3.2c), the can be obtained by 
solving the following linear system 

A*AiZ) +I3Z = A*{D) -A^ + p {BiL^+^) + €(3^+^)) , 

whose complexity would depend on the choice of A in our model (1.1). For example, 
when A is just the identity map, the is given explicitly by 

Z'^^^ = [D -A^+P {B{L^+^) + C(5'''+i))] . 

4. Convergence analysis. In this section, we discuss the convergence of Al¬ 
gorithm 1 for 0 < T < ■ We first present the first-order optimality conditions 

for the subproblems in Algorithm 1 as follows, which will be used repeatedly in our 
convergence analysis below. 

' 0 G - B*{A^) + PB* {B{L^+^) + C(S''=) - Z'^) , (4.1a) 

0 G d^{S^+^)-C*{A’^) + pc* {BiL^+^)+CiS’^+^) - Z^) , (4.1b) 

0 = A*{A{Z'^+^)-D) + A'^-P{B{L’^+^)+C{S'^+^)-Z^+^), (4.1c) 

^ A'^+i - A’^ = -tP {B{L^+^) + C(5''=+i) - Z’^+^) . (4.1d) 
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Our convergence analysis is largely based on the following potential function: 
QrAL, S, Z, A) = Cp{L, S, Z, A) + 0(r)/?||e(A) + C{S) - Z\\l, 


where 


6l(r) := max |l - T, for 0 < r < ^ 

Note that 9{-) is a convex and nonnegative function on ^0, . Thus, for any 

{L, S, Z, A), we have Qt,i 3 {L, S, Z,A) > Ci 3 {L, S, Z, A) for 0 < r < , and the 

equality holds when r = 1 (so that 6{t) = 0). 

Our convergence analysis also relies on the following assumption. 

Assumption 4.1. ih, <&, B, C, P and r satisfy 
(al) B*B ^ (tT for some cr > 0 and C*C ^ a'T for some a' > 0; 

(a2) ih is continuous in its domain; 

(aS) the first iterate {L^, , Z^, A^) satisfies 

era(L\S\Z^,A^)<ho:= liminf 4'(L) + $(5'). 

||L||ir + ||S||j.^oo 

Remark 4.1 (Note on Assumption 4.1). (i) Since B andC in (1.1) are injec¬ 
tive, (al) holds trivially; (ii) (a2) holds for many common regularizers (for example, 
the nuclear norm) or the indicator function of a set; (Hi) (aS) places conditions on 
the first iterate of the algorithm. It is not hard to observe that this assumption holds 
trivially if both 41 and $ are coercive, i.e., if liminf 'I'(L) + $(5') = oo. We will 

||i||F + ||S||F->00 

discuss more sufficient conditions for this assumption after our convergence results, 
i.e., after Theorem f.f. 

We now start our convergence analysis by proving the following preparatory 
lemma, which states that the potential function is decreasing along the sequence 
generated from Algorithm 1 if the penalty parameter /3 is chosen above a computable 
threshold. 

Lemma 4.1. Suppose that 0 < r < and {(A^, A^)} is a sequence 

generated by Algorithm 1. If (al) in Assumption f.l holds, then for k> 1, we have 

0..;3(A''+\ A'^+l) - 0r./3(A^ A'^) 

< (max{i - Z>^\\1 - 

Moreover, if P > — Aaia _|_ 1 y/+ max | ^| • 8A^~, then the sequence 

, S^, Z^, A^')}^j^ is decreasing. 

Proof. We start our proof by noticing that 

0r./3(A''+\ 5’''+^, , A'=+1) - 0^,^(A'=+\ 5’'=+^, , A'=) 

= -(A'^+i - A^ L'^+i + - Z^+^) = —||A''+i - A'=|||,, (4.4) 

tP 

where the last equality follows from (4.Id). We next derive an upper bound of || A^+^ — 












ADMM FOR NONCONVEX AND NONSMOOTH PROBLEMS 


9 


A^|||,. To proceed, we first note from (4.1c) that 

0 = -D)+ - /3(S(L''+^) + C(5''=+^) - 

= A*iA{Z^+^) -D) + A^ + -{A^+^ - A’^) 

T 

A'=+1 =r^*(14-^(Z'=+l)) + (l-r)A^ 
where the second equality follows from (4.Id). Hence, for fc > 1, 

Afc+i _ 

= [tA*{D - AiZ’^+^)) + (1 - t)A'=] - [tA*{D - A{Z’^)) + (1 - r)A^-i] 

= tA*A{Z’= - Z'^+^) + (1 - t){A^ - A'^-i). (4.5) 

We now consider two separate cases: 0 < r < 1 and 1 < r < ■ 

• For 0 < T < 1, it follows from the convexity of |1 • Hi, that 

||Afe+i _ ^fe||2^ ^ WrA^AiZ’^ - Z'^+^) + (1 - t)(A'= - A'=-i)||^ 

< rALxll^'=+' - Z’^Wl + (1 - t)||A'= - A'^-^Wl. 

We further add —(1 — r) ||A^+^ — A^||^ to both sides of the above inequality 
and simplify the resulting inequality to get 
||Afc+i_Afe||2, 

< ALxII^'=+' - ^'=111 + dlA'^ - A^-Yf - - A'^ll^) 

= (1 - r)T/32 {\\BiL^)+C{S^) - Z'^Wl - \\B{L'^+^) + 0(3'^+^) - Z^+^\\%) 

+ aLxII^'=+'-^"II|. 


|46) 


where the last equality follows from (4.Id). 

• For 1 < r < dividing r from both sides of (4.5), we have 

i (A'^+i - A'^) = A*A {Z^ - Z'=+^) + (^^ - {A^ - A^-^) 

= i tA*A {Z^ - (A''-^ - A'^). 

This together with 0 <i< 1 and the convexity of || • |||,, implies that 

II i - A'=)||^ < ^\\tA*A {Z’^ - Z'^+^) \\l + (1 - i)||A'=-i - A'=|||. 

< rXl,JZ'^+^ - Z>^\\1 + (1 - i)||A'= - A>^-Yf 
IIA'^+i - A'^Wl < r^Xl,jZ^+^ - Z'^Wl + (r^ - r) ||A'^ - A'^-^\\1. 


Then, adding — (r^ — r) ||A^'+^ — A^'||^ to both sides of the above inequality, 
simplifying the resulting inequality and using the fact that 1 + t — >0 for 


1 < T < 


i+Vs 


we see that 


II 


IIA'^+i - A'^'ll 

< - ^'=111 + TT^ (IIA'^ - A'=-i|. - IIA'^+i - A^lll) 

= + K+ZJ (ll^(A'=) + C{S'^) - 

- ||H(L'=+1) + C{S’^+^) - Z'^+^Wl) , 


(4.7) 
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where the equality follows from (4.Id). 

Thus, for 0 < T < , combining (4.6), (4.7) and recalling the definition of 6{t) in 

(4.2), we have 

< max {i, - Z^\\l 

+ 0{t)/3 (||S(L'=) +C(5''=) - Z^Wl - ||S(L'=+^) +C(S''=+i) - Z^+^\\j,) . 

Next, note that the function Z H> , Z, A'^) is strongly convex with 

modulus at least Amin + P- Using this fact and the definition of as a minimizer 
in (3.2c), we see that 

0n./3(A''+\ A'^) - 0n,/3(A''+\ A’^) 

= Cp (L'^+i, S''=+\ Z^+^ , -S''=+1, A'=) 

+ 6I(t)/ 3 (||B(L'=+i) +C(5'=+1) - ^'^+^111. - \\B[L^+^)+C{S^+^) - Z^fp) (4.9) 
< e{T)p (||s(l''+ 1) +c(S''=+i) - z^+^fp - \\b{l'^+^) + c{s^+^) - z^w],) 

■^min ±P-\\Z^+^ - Z’^fp. 

Moreover, using the fact that is a minimizer in (3.2b), we have 

0^,;3(T''+\ 5''=+\ A^) - 0^,^(A'=+\ 5'^ A'^) 

= A'^) - Z^, A^) 

+ e{T)P {\\B[L^+^) + C{S^+^) - Z^fp - \\B{L'^+^) + C(5^) - Z^fp) 

< 6»(t)/? (||S(A'=+^) +C(S''=+^) - Z^^Wl - ||S(A'=+1) +C(S''=) - Z'^fp) . 


Finally, note that L i—>■ Cp{L, S^, Z^, A^') is strongly convex with modulus at least 
(7/3 from (al) in Assumption 4.1. From this, we can similarly obtain 


A'') - QrAL'", S'", Z'", A'") 

< 0(r)/3 {\\B{L'"+^)+C{S'") - Z'"fp - \\B{A)+C{S'") - Z'"fp) (4 

- - i‘11?.. 


Thus, summing (4.4), (4.8), (4.9), (4.10) and (4.11), we obtain (4.3). 

Now, suppose in addition that /3 > — Amin _|_ _|_ max | | • 8An 

Then it is easy to check that 


max 


1 , 1 

T 1 + T - j 


P 



< 0 . 


Hence we see from (4.3) that 

0n./3(T''+\ S''=+\ A'^+i) - QrAL'". S'", A'") < 0, 

which means that {0T-„a(T^, 5^, A^)}^ is decreasing. This completes the proof. 

□ 
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We next show that the sequence generated by Algorithm 1 is bounded if /3 is 
chosen above a computable threshold, under (al) and (a3) in Assumption 4.1. For 
notational simplicity, from now on, we let 

P ■= max|max{l/T,r} • Amax,-^+ + ’ ^A^axj ■ (4-12) 

Proposition 4.2 (Boundedness of sequence generated by ADMM). Sup¬ 
pose that 0 < r < and P > p. If (al) and (aS) in Assumption 4-1 hold, then a 

sequence generated by Algorithm 1 is bounded. 

Proof. With our choice of P and (al) in Assumption 4.1, we see immediately from 
Lemma 4.1 that the sequence {Qr,p{L^, is decreasing. This together 

with (a3) in Assumption 4.1 shows that, for fc > 1, 


/lo > QrAL\S\Z\A^) > erAl^\S\Z\A>^) 

= ^'(L'^') + $(5''=) + ^\\D-A{ZA\% - {A'^,B{L^)+C{S^)- Z'^) 

+ (1 + 2d(r)) ^WBiL'^) + C(5^) - Z>^\\% 

= + AS'^) + l\\D- A{ZA\l + ^\\BiL^) + C(5'=) - Z^ - 1a'=||^ 

- + 0{r)P\\BiL'^)+CiS>^) - Z>^\\%, 

where the last equality is obtained by completing the square. We next derive an upper 
bound for ||A^||^. We start by substituting (4.Id) into (4.1c) and rearranging terms 
to obtain 

0 = A*(A(Z'=) -D) + A'^-i + -(A'^ - A'=-^) 

r 

^ - rA'^ = TA*iAiZ’^) -D) + {1-t) (A'= - A^-^). (4.14) 


We now consider two different cases: 

• For 0 < r < 1, it follows from the convexity of |1 • |||. and (4.14) that 

II - tA>^\\% < T\\A*{AiZ>^) - D)\\l + (1 - r)||A'^ - 

< rA„,axM(Z'=) - Dill, + (1 - r)||A'= - A'^All 
= tX^AIAZ'^) - D\\% + (1 - Typ^\\BA)+c{s'^) - zAp, 
where the equality follows from (4.Id). Then, we have 

||Afc||| < ^||_4(^fe) _ _D||| + (1 _ t)P^BA)+C{S’‘) - ZAp. (4.15) 
r 

• For 1 < T < 44^, by dividing —r from both sides of (4.1c), we obtain 

A'' = i tA*{D - AiZy + (^1 - (A'^ - A'^-^). 

Then, since 0 < 4 < 1, using the convexity of || • |||, and (4.Id), we have 

IIA^II < ^\\rA*{D - AiZAfp + (l - ^) IIA'^ - A'=-^||| 

< tX^A\D - AA)fp + (r - 1)tP^\\BA)+C{S^) - Z^^fp. 
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Thus, combining (4.15) and (4.16), we have 
||A'=||| < max{l/T,T} • A, 


1 


zp 


\D-A{Z^)r^ 

+ max{l - r, (r - 1 )t}^2 \\B{L^) + C{S'^) - 

D-A{Z’^)\\l (4.17) 


2/3 


_ max{l-r^(r-l)r}/3 ||g^^fc^ - Z^fp. 


Substituting (4.17) into (4.13), we have 

ho > 0r,/3(A^ 5'^ A^) > T(L'=) + $(5''=) 

+ ^ (l - max{l/T, r} • \\D - A{Z^)fp 

B 1 (4-18) 

+ [20(r) - max{l - r, (r - l)r}] • |||^(l'=) + C(5'=) - Z^fp. 

With (4.18) established, we are now ready to prove the boundedness of the se¬ 
quence. We start with the observation that for 0 < r < and ^ > P, we always 

have 


1 — max{l/T, r} • > Q 


(4.19) 


and 


29{t) — max{l — r, (r — 1 )t} 


1 — r > 0, 

0 , 

T(T-l)(r^+T-l) PI 
l+T-r2 ^ ^ 


for 0 < T < 1, 

for T = 1, (4.20) 

for 1 < T < 44^, 


where 9{t) is defined in (4.2). Then we consider two cases: 

• For r € (0,1) U (l, 4^^), it follows from (4.18), (4.19), (4.20), and the 

nonnegativity of 4/ and $ that {||iA ~ A{Z^)\\f}, {\\B{L^) + C{S^) — Z^ — 
^A^IIj’} and {\\B{L^) +C{S^) — Z^\\f} are bounded; and moreover, 

4'(T'=) -h $(5''=) < ho. 

The boundedness of {L^} and {5”^} follows immediately from this last rela¬ 
tion. Furthermore, {A^} is bounded since 


IIA^'IIf < I3\\B{L^)+C{S’=) -Z'^- ^A'^IIf +/3||i3(T'=) +C(5'=) - Z’^\\f. 


Finally, we obtain the boundedness of {Z'^} from 

\\Z^\\f < \\B{L>^)+C{S^) -Z^- ^K^Wf + \\B{L^)\\f 
+ \\C{S^)\\F + -^\\k^\\F. 


(4.21) 
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• For T = 1, it follows from (4.18), (4.19), (4.20), and the nonnegativity of 
4- and $ that {||£i - A{Z^)\\f} and {\\B{L^) + C{S^) - are 

bounded; and moreover 4'(i^) + <I)(S'^) < ho, from which we see immediately 
that {L^} and {5^} are bounded. The boundedness of {A^} now follows from 
(4.14) with T = 1, i.e., = A*{D-A{Z'^)). The boundedness of {Z^} again 

follows from (4.21). 

This completes the proof. □ 


We are now ready to prove our first global convergence result for Algorithm 1, 
which also characterizes the cluster point of the sequence generated. 

Theorem 4.3 (Global subsequential convergence). Suppose that 0 < r < 
and fd > p. If Assumption 4-1 holds, then 

(i) lim ||L'=+i - L’^Wf + |15''=+i - S’^Wf + \\Z’^+^ - Z'^\\f + ||A'=+i - A'=||i. = 0; 

k—¥co 

(ii) Any cluster point {L*, S*, Z*,A*) of a sequence {{L^, , Z^, A*^)} generated by 

Algorithm 1 is a stationary point of (1.1). 

Proof. The boundedness of the sequence {{L^, S^, Z^, A^')} follows immediately from 
Proposition 4.2 and thus a cluster point exists. We now prove statement (i). 

Suppose that {L*, S*, Z*, A*) is a cluster point of the sequence {(T^, S'^, Z'^,A'^)} 
and let {(L^yS'^y Z'^^A^*)} be a convergent subsequence such that 

lim = {L*,S*,Z*,A*). 

i—voo 


By summing (4.3) from fc = 1 to fe = fci — 1, we have 

er,p [L’^', ,z’^%A’^')-erAL\s\z\A^) 

<-cf2 ^ E 


(4.22) 


where C := — max |i, | ■ > 0 (since /3 > /3). Passing to the limit 

in (4.22) and rearranging terms in the resulting relation, we obtain 


00 /Q 

||Z^-+1 - z^wi + ^ E 

k^l k^l 

< QrAL\S\z\A^) - QrAL\ S\Z\ A*) < 00 , 

where the last inequality follows from the properness of 4' and 4>. This together with 
C > 0 and cr > 0 implies that 


<00 and < 00 . 


Hence, we have 

Zk+I _ ^ 0, (4.23) 


Next, by summing both sides of (4.8) from k = 1 to k = ki and passing to the limit, 
we have 




^ ||A^+i - A'=|||, < max {i, ■ rA^x E 

/l-l 

9{t)tA (\\B{A)+C{S^) - Z^Wl - \immt\\B{A+^)+C{S'^+^) - Z'‘+^\\%] , 


/c=l 


k—¥oo 
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from which we conclude that 


^fe+i _ (4 24) 

Finally, we have — 5”^' —>■ 0 from (4.23), (4.24), (4.Id) and (al) in Assumption 

4.1. This proves statement (i). 

We next prove statement (ii). From the lower semicontinuity of 0r,/3 (since dt 
and 4> are lower semicontinuous), we have 

liminf 0^^(L'=‘+\5''=^+\Z'=%A'=*) > + ^{S*) + Ud - A{Z*)\\% 

’ 2 (4 25) 

- (A*, BiL*)+CiS*) - Z*) + (1 + 20(r)) ^\\B{L*) + C{S*) - Z*\\l. 

On the other hand, from the definition of as a minimizer in (3.2b), we have 

+ 0(t)/3 (||,B(L'=-+1) +C(5'=^+1) - Z'^^Wl - \\B{L'^^+^)+C{S*) - Z^^\\l). 

Taking limit in above equality, and invoking statement (i) and (a2) in Assumption 

4.1, we see that 

limsup 0,,^(T'“-^+\5 ^-+\z'“'SA'=-) < -^{L*) + <^>{S*) + h\D - A{Z*)\\j, 

2 (^26) 

- (A*, B{L*) + CiS*) - Z*) + (1 + 20(r)) ^||S(T*) + C{S*) - Z*fp. 

Then, combining (4.25) and (4.26), we see that 

lim = ^(L*) + $(5*) + i||i:)-y4(Z*)||| 

z—>-oo 2 

- (A% B{L*)+C{sn - Z*) + (1 + 20{t)) ^\\B{L*)+C{S*) - Z*fp, 

which, together with (a2) in Assumption 4.1, — L’^ —>• 0, ^ 0 and the 

definition of 0 r,/ 3 , implies that 

lim $(5''=^+^) = $(5”*). (4.27) 

i—^oc 

Thus, passing to the limit in (4.1a)-(4.1d) along , Z^', A^')} and invok¬ 

ing statement (i), (4.27) and (2.1), we see that 


{ 0 G d'i>{L*) - B*{A*) + I3B* {B{L*) -b C{S*) - Z *), 
0 G 9$(S'*) -C*(A*) -b/3C* {B{L*)+C{S*) - Z*), 

0 = A*{A{Z*) -D)+A* - PiB{L*) + C{S*) - Z*), 
B{L*)+C{S*) = Z*. 


Rearranging terms in (4.28), it is not hard to obtain 

Jog a4'(T*) -b B*A* {A{B{L*) + C{S*)) - D ), 
I 0 G a$(S'*) -b C*A* {A{B{L*) -b C(5'*)) - D). 
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This shows that {L*, S*, Z*, A*) is a stationary point of (1.1). This completes the 
proof. □ 

Remark 4.2 (Comments on the computable threshold). From the above 
discussions, we establish under Assumption 4-1 the convergence of the ADMM with 
0 < T < when the penalty parameter jd is chosen above a computable threshold 

P which depends on t. The existence of this kind of threshold is also obtained in the 
recent studies [1, 26, 31,44! 45] on the nonconvex ADMM and its variants with r = 1. 
In Fig. 4-1, we plot p against r with A being the identity map (hence, Amax = Amin = 
1). It is not hard to see from Fig. 4-1 that for a given penalty parameter P > 1, 
we can always choose a dual step-size r from an interval containing 1 so that the 
corresponding ADMM is convergent. 



Fig. 4.1. The computable threshold ft for 0 < r < 

Remark 4.3 (Practical computation consideration on penalty parame¬ 
ter). In computation, for a 0 < t < , the p in (4.12) may be too large and 

hence fixing a P close to it can lead to slow convergence. As in [32, 43], one could 
possibly accelerate the algorithm by initializing the algorithm with a small P (less than 
P) and then increasing the P by a constant ratio until P > P if the sequence generated 
becomes unbounded or the successive change does not vanish sufficiently fast. Clearly, 
after at most finitely many increases, the penalty parameter P gets above the threshold 
P and the convergence of the resulting algorithm is guaranteed by Theorem 4-3 under 
Assumption 4-1. On the other hand, if P is never increased, this means that the suc¬ 
cessive change goes to zero and the sequence is bounded. Then it is routine to show 
that any cluster point is a stationary point if $ is continuous in its domain. 

Under the additional assumption that the potential function 0t,/3 is a KL func¬ 
tion, we show in the next theorem that the whole sequence generated by Algorithm 1 
is convergent if P is greater than a computable threshold, again under Assumption 4.1. 
Our proof makes use of the uniformized KL property; see Proposition 2.2. This tech¬ 
nique was previously used in [5] to prove the convergence of the proximal alternating 
linearized minimization algorithm for nonconvex and nonsmooth problems, and later 
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in [44,45] to prove the global convergence of the Bregman ADMM with r = 1. Our 
analysis, though follows a similar line of arguments as in [44,45], is much more intri¬ 
cate. This is because when r ^ 1, the successive change in the dual variable cannot 
be controlled solely by the successive changes in the primal variables. 

Theorem 4.4 (Global convergence of the whole sequence). Let 0 < r < 
and /3 > j3. Suppose in addition that Assumption 4-1 holds and the poten¬ 
tial function is a KL function. Then, the sequence {{L^, 

generated by Algorithm 1 converges to a stationary point of (1.1). 

Proof. In view of Theorem 4.3, we only need to show that the sequence is convergent. 
We start by noting from (4.18), (4.19) and (4.20) that 2''=, A'^)}^ ^ is 

bounded below. Since this sequence is also decreasing from Theorem 4.1, we conclude 
that limfc_>oo 0r„a(A^, 5'^, A^) =: 0* exists. In the following, we will consider two 

cases. 

Case 1) Suppose first that , S^ for some A^ > 1. Since 

{ 0 ^,; 3 (i^ A'^)}^! is decreasing, we must have 0 ^,; 3 (L'=, S'^ A'^) = 0* for 

all k>N. Then, it follows from (4.3) that L^+‘ = and = Z^ for all t > 0. 
Hence, {T^} and {Z’^} converge finitely. Moreover, from (4.5), we have 

jjA'^+i - A'^'IIf = |1 - t| • ||A'= - A'=-1||f = ... = |1 - ■ ||A^ - 

for all k > N. Since 0 < r < , we have 0< 1 — |1 — r| < 1 and hence we see 

further that 

OO 

^ ||A'=+i - A'=||p < ||A^ - A^-ij]^ < OO, (4.29) 

k=N ' 

which implies the convergence of {A^}. Additionally, for all k > N, we have 


ll^fc+i 


< ^||C(S''=+i) -C(S''=)||f 

y a' 


< 


rfiAo' 


Tj3 

||Afe+i 


A^+i) - - A'^) 

t/3 f 


F, 


where the first inequality follows from (al) in Assumption 4.1 and the equality follows 
from (4.Id). This together with (4.29), implies that J2T=n < oo. 

Thus, {S’^} is also convergent. Consequently, we see that {{L^, 5^, A^)}^^ is a 

convergent sequence in this case. 

Case 2) From now on, we consider the case where Qr,p{L^, , Z^, A^) > 0* 
for all fc > 1. In this case, we will divide the proof into three steps: 1. we hrst 
prove that Qt,p is constant on the set of cluster points of the sequence {{L^, S^, Z^, 
A^)}^i and then apply the uniformized KL property; 2. we bound the distance from 
0 to 3. we show that the sequence {(L'=, 5''=, Z^, A'')}^ ^ is a 

Cauchy sequence and hence is convergent. The complete proof is presented as follows. 

Step 1. We recall from Proposition 4.2 that the sequence {[L^, S^, Z^, 
generated by Algorithm 1 is bounded and hence must have at least one cluster point. 
Let r denote the set of cluster points of {(L^, S^, Z^, A^)}^i- We will show that 
0 t ,/3 is constant on T. 
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To this end, take any {L*, S*, Z*, A*) € T and consider a convergent subsequence 
with limi^oo(A'=S5'=SZ'=SA'=^) = {L*,S*,Z*,A*). Then from 
the lower semicontinuity of 0 t ,/3 (since and $ are lower semicontinuous) and the 
definition of 6 *, we have 

r = lim (4.30) 

i—^oo ’ ’ 

On the other hand, notice from the definition of 5'^+^ as a minimizer in (3.2b) that 

+ e{T)p {\\b{lA + c{sA - - \W{lA + c(5*) - z^^-^\\l) 

< 6{t)P {\\B{lA+C{SA - - \\B{lA+C{S*) - Z'^'A\l) ■ 

This together with Theorem 4.3(i), the continuity of Qt,p with respect to L (from 
(a2) in Assumption 4.1), Z and A; and the definition of 0* implies that 

r=lim QrAL'"\S^\Z’^\kA <QrAL*A*,Z\h*). (4.31) 

i—^QO ’ ’ 

Combining (4.30) and (4.31), we conclude that Qr,p{L*, S*^ Z* A*) = ■ Since 

(L*, S*, Z* A*) G r is arbitrary, we conclude further that the potential function 0r,/3 
is constant on T. 

The fact that Qt,i 3 = 0* on T together with our assumption that 0 t., 9(O is a KL 
function and Proposition 2.2 implies that there exist e > 0, r; > 0 and ip € S^, such 
that 


if' (0r./3(T, s, Z, A) - 9*) dist (0, d&rAL. S, Z, A)) > 1 


for all {L,S,Z,A) satisfying dist((T, S', Z, A), T) < e: and 9* < Qt,p{LAAA) < 
9*+'q. On the other hand, since limfc_>.oo dist((T*', S^, A^), T) = 0 by the definition 

of r, and Qr, 0 {L^, , Z^, A^) — >• 9*, then for such e and rj, there exists fci > 3 such 

that dist((T'^',S'=,Z'=,A'=),r) < e and 9* < A'"A'") <«'*+?? for all 

k > ki. Thus, for k> ki, we have 


if' {er, 0 iA, s'', A'') - 9*) dist (0, dOrAL^^ 5''=, A'')) > 1. (4.32) 

Step 2. We next consider the subdifferential dQr, 0 {L^A^)- Looking at 
the partial subdifferential with respect to L, we have 


5L0../3(L^S^Z^A'') 

= d-^{A) - B*A^) + (1 + 29{t))(3B*{B{A) + C(S'') - Z'') 

= d-^{A) - S*(A''-i) + PB*{B{A) + C(S''-i) - ^'=-1) + 29{t)I5B*(B{A) + C(S'') - Z^) 
- B*A'" - A''”^) + PB*{C{S^) -Z^ - C(S''-i) + Z^-^) 

9 29{t)PB*(B{A) + C(S'') - Z'^) - B*A^ - A''-^) + PB*{C{S^) - Z'^ - C{S^-^) + Z''-^) 


(l + ‘^'^B*A^ - A'^-i) + /3S*[(C(S'') - Z^) - (C(S''-^) - Z''-!)] 

- (l + B^A'" - A A + PB* 'y(-B{A) - - (-B{AA 

= + 2e(r)+i ^y^^fc _ iS*(A''-i - A''-2) - PB*B{A - l''-^), 


_ 

(ii) 


Afc-i-Afc-2 

rl3 
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where the inclusion follows from (4.1a), and the equalities (i) and (ii) follow from 
(4.Id). Similarly, 


dsQrAL\S\Z\K<^) 

= dAS'") - C*{A) + (1 + 26{T))j3C*{B{A)+C{S'^) - Z^) 

= BAS'") - C*{A-^) + I3C*{B{A)+C{S'^) - z^-^) 

+ 2e{T)l3C*{B{A) + C{S’^) - Z^) - C*(A'= - A'=-^) - 13C*{Z^ - Z''-^) 

9 20{t)^C*{B{A) + C{S'^) - Z^) - C*{A - JA-^) - pc*[z^ - Z^-^) 

= _ C*{A - - ^C*{Z^ - z'^-^), 

where the inclusion follows from (4.1b) and the last equality follows from (4.Id). 
Moreover, 


Vz0r./3(A^5^Z^A'=) 

= A*{AiZ^) -D) + A'^- P{B{A) + C{S^) - Z'^) 

- 20{t)P{B{A)+C{S'^) - Z'^) 

= M*(M(Z'=) -D) + A'^-i - I3{B{A) + C{S'^) - Z^) 

- 20{t)13{B{A) + C{S^) - Z'^) + (A'^' - A'^-i) 

= -20{t)(3{B{A)+C{S'^) - Z'^) + (A'^ - A'^-i) 


where the third equality follows from (4.1c) and the last equality follows from (4.Id). 
Finally, 


VA0r /3(A^ 5'^ A^) = -{B{A) + C{S'^) - Z^) = \{A^ - A'=-^), 

TP 

where the last equality follows from (4.Id). Thus, from the above relations, there 
exists a > 0 so that 

dist(0, dQr,p{L\S\Z\A^)) 

< a {\\A - AA\f + \\Z^ - Z'^-^Wf + IIA'^ - A'=-1||f + ||A'=-i - A'^-'^\\f) ' 


Step 3. We now prove the convergence of the sequence by combining (4.33) with 
(4.32). For notational simplicity, define 

A := ¥>(0^,;3(T^5'^Z^A'=) -0*) -(p(0^,;3(A''+\5''=+\Z'=+\A'=+i) -0*) . 

Since 0 t-,/ 3 is decreasing and if is monotonic, it is easy to see > 0 for A: > 1. Then 
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we have for all k > ki that 

a dlL'^' - + \\Z’^ - Z^^-^Wf + IIA'^ - A^-^\\f + - A'=-2||f) • A'' 

>dist(0, 90^,;3(A^5^Z^A'=))• 

> dist(0, dQrAL'". ‘S'^ A))' (0r./3(A^ S'^ A*^) - 9*) 

■ [0^,;3(L^S'^Z^A'=)-0^,;3(A''+^S'''+^^''+^A'=+l)] (4^ 

> QtAL'", -S'^ Z'^, A’^) - 0^,;g(L'=+\ 5'=+\ A'^+i) 

> h\\A+^ - lYf + b2\\Z'^+^ - z'^wl 

> i mm{h,b2} ■ [\\A+^ - L>^\\f + 11^'+' - Z>^\\f]\ 


where the first inequality follows from (4.33), the second inequality follows from the 
concavity of (/?, the third ineqnality follows from (4.32), the fourth inequality follows 
from (4.3) with bi := ^ and 62 := — max|i, ' 

Dividing both sides of (4.34) by c := imin{ 6 i,& 2 }, taking the square root and 
using the inequality y/uv < for u, n > 0 to further upper bound the left hand 
side of the resulting inequality, we obtain that 


^ (IIL^ - L'=-dlF + \\Z’^ - Z^-^\\f + IIA'^ - A'^-iIIf + ||A'=-i - A'=-2||p) + 

> \\A+^ - A\\f + \\Z^+^ - Z’^Wf, 


where 7 is an arbitrary positive constant. On the other hand, it follows from (4.5) 
that 


||A'= - A'^-^If = ||r^M(Z'=-i - Z^) + (1 - r)(A'=-i - A'^-^)]!^ 

< rX^AlZ’^ - Z>^A\f + |1 - t| • ||A'=-i - A'^-dlF. 

Adding — 11 — r| • || A^ — to both sides of the above inequality and simplifying 

the resulting inequality, we obtain that 

||A'=-A'^-^If 

< - Z’^AIf + (IIA^-i - Afc-2||F - IIA^ - A^-iIIf) (4.36) 

= dy\\Z^ - Z'^-^Wf + d2 (IIA'^-i - A'^-dlF - IIA'^ - A'=-1|1 f) , 
where we write di := A\fA\ *^2 := notational simplicity. Similarly, 

IIA'^-i - A'^-dlF < - Z’^AIf 

+ d2 (IIA^^-^ - A'^-dlF - IIA*^-! - A'=-2||^) . 


Then substituting (4.36) and (4.37) into (4.35) and rearranging terms, we have 

(1 - \\A+^ -Ay + (i-^- \\z'^+^ - z>^\\f 


<^{\\A-AA\f-\\A+^-A\\f) 

+ ( 2 ^ + t) (11^" - 

+- z’^AW - WZ'^ - z'^-^Wf) 

+ ^ (||A'=-i - A'^-dlF - IIA'^ - A'^-iIIf) 

+ || dlA'^-^ - A'^-dlF - ||A'=-1 - A'^-dlF) + lA^ 


(4.38) 
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Thus, summing (4.38) from k = ki to oo, we have 

(i - - l'^Wf + (i - ^ f - z'^Wf 


< + ( 2 ^ + f) 11^"^ - Z’^^-^F + 

+ If - A^^-^Wf + If - A>^^-^\\f + ^ip {QrAL'"\S'^\Z^\A'^^) - 9*) 

< 00 . 


- 

" k\ ck\ 


Recall that 7 introduced in (4.35) is an arbitrary positive constant. Taking 7 > 
and hence 1—^>1 — ^ — |f> 0 , we have from the above inequality that 

00 00 

< 00 and E _ z^Wf < 00 . 

k—ki k—ki 

Hence {T^} and {Z^} are convergent. Additionally, summing (4.36) from k = ki to 
00 , we have 

00 00 

^ ||A'=-A'=-^|lj^<di ^ \\Z’^-Z'^-^\\F + d2\\A^^-^-A’^^-^\\F<oo, 

k—ki k—ki 

which implies that {A^} is convergent. Finally, from (4.Id) and (al) in Assumption 
4.1, we see that {5'^} is also convergent. Consequently, we conclude that {(T^, S”^', 
A'^)}-, is a convergent sequence. This completes the proof. □ 

Our convergence analysis relies on Assumption 4.1. While (a3) in Assumption 4.1 
appears restrictive since it makes assumptions on the first iterate of Algorithm 1 , we 
show below that this assumption would hold upon a suitable choice of initialization. 
Specifically, if we initialize at (L°, S'®, A°) satisfying 

r QrAL\S\z\A^) < erAL°, S°, z°, A°), (4.39a) 

I 0,,^(L°,SO,ZO,A®) </io, (4.39b) 

then it is easy to check that (a3) in Assumption 4.1 holds. In the next proposition, 
we demonstrate that (4.39a) can always be satisfied with a suitable initialization. 
After this, we will propose a specific way to initialize Algorithm 1 for a wide range of 
problems so that both (4.39a) and (4.39b) are satisfied. 

Proposition 4.5. Suppose that 0 < r < 44^^ and f3 > p. If the initialization 
(L®, S®, Z®, A®) is chosen as (T®,S®) € dom4> x domd) and 

A° = A*{D-AiZA, (4.40) 

then we have 

erAL\s\z\A^) < erAL°,s°,z°,A)- 


Proof. First, from (4.1c), we have 

0 = A*iAiZ^) - £>) + A® - /3(S(T^) +C(S^) - Z^) 

B{A)+C{S^) -Z^ = ^A® + ^A*{A{Z^) -D) = ^A*A{Z^ - Z°), 


(4.41) 
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where the last equality follows from (4.40). Then, 


Qr,0{L\S\z\K^) - QrAL\S\Z\K^) 

= -{A - A,b{A) + c{s^) - z^) = tAB{A) + c{s^) - z^wl 
= (r + 0(r)) m{L")+C{S^) - ^'IIf - + C{S^) - ZTf 


= {t + 6{t))P ^A*A{Z^ 



e{r)m{L^)+C{S^)-ZTp 


<{r + e{T)) ^\\Z^ - z°\\l - e{T)AB{A) + c{s^) - zYf, 


(4.42) 


where the second equality follows from (4.Id) and the fourth equality follows from 
(4.41). Additionally, using the same arguments as in the proof of Lemma 4.1 leading 
to (4.9), (4.10) and (4.11), it is easy to see that 

erAL\s\ z\A°) - erAL\s\z°, a°) < _ z°\\l 

+ 0{t)P {\\B{A)+C{S^) - ZYf - mA)+C{S^) - ZYf) . (4.43) 

erAL\s\ z°, A°) - erAL\s°, z°,a°) 

< 0(r)/3 {\\B{A)+C{S^) - Z<^\\1 - \\B{A)+C{S°) - ZYf) , (4.44) 

QrAL\S°, Z°, A°) - erAL°, 5°, A°) 

< 0(t)/3 {\\B{A)+C{S°) - Z°\\% - \\B{A)+C{S°) - ZYf) ■ (4.45) 

Summing (4.42), (4.43), (4.44) and (4.45), we obtain 
erAL\s\z\A^) - QrAL°, Z°, AO) 

< [iT + 9{T))^-^^^y\z^-z°fp-0{T)AB{Y + cis°)-zYF- 

We now consider two cases: 

• For 0 < T < 1, it is easy to see 0{t) = 1 — t and 

o ^ I Amax Amin , 4 / \ 2 . ® \ 9 I 

/3 > max - — + -y A^i„ + -A^^x > ■ 


Then, we have 


(r + 6l(T))%^ 


Amin + P Amax Amin ^ Amax _ Amin T P ^ ^ 


2 P 2 - tP 2 

For 1 < r < 44^, it is easy to see 9{t) = o-nd 

A: 


P > max < rAn 


+ A. 


8 t 2 

1 + r — T 


rA 2 


2 max 


Then, we have 


[r + 0{T))^^ 


'^min + P 


tX: 


2 

max 


^min + P 


< 


_2 \2 
' '^max 


(1 + T — t'^)P 


2 {1+t-YP 2 

Amin + P ^ Q 


2 
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Thus, combining the above with (4.46) and 9{t) > 0, we conclude that 

erAL\s\z\A^) < QrAL^s°,z°A°)- 

This completes the proof. □ 

From Proposition 4.5, we see that if the initialization {L'^ , , Z^, A^) is chosen 

to satisfy the conditions in Proposition 4.5, then (4.39a) holds. Based on this, we 
can now present one specific way to initialize Algorithm 1 so that both (4.39a) and 
(4.39b) are satisfied for a class of problems, whose objective functions 4'(L) and ^(A) 
take forms Sa{L) and (1.2), respectively; here, is a compact convex set. 

The initialization we consider is: 

A = VAkD), S° = 0, Z°= B{A), A° =AAD- A{Z°)) , (4.47) 

where n is a scaling parameter. One can easily check that this initialization satisfies 
(4.40). Moreover, 

QrAL°, S°, AO) = l\\D-A (^0) 11^ = ^\\D-A • 

Thus, the condition (4.39b) is equivalent to 

l\\D - A{BiVniKD)))\\l < limini ^(i) + $(5) = liminf ^S). (4.48) 

2 IILIIf + IISIIp-^oo ||S||f->oo 

We further discuss this inequality for some concrete examples of $ presented in the 
introduction. 

Example 4.1. Suppose that $ is coercive. Then liminf 4>(5') = oo and hence 

IISIIp^oo 

(4.48) holds trivially for any choice of n. 

Example 4.2. Suppose that ^{S) = i+a|si-| Z®’" a > 0. Then 

liminf $(5) = fa. Hence (4.48) holds if the parameter k can he chosen so that 

\\S\\f^oo 

\\\D - A{B{VA^D)))\\1 < fi. 

Example 4.3. Suppose that $(S') = fJ-YA=i min(l, (a — t//r)+/(a — 

1)) dt for a > 2. Then liminf d>{S) = ^(a + 1)A- Hence (4.48) holds if k can be 

||S||p-)-oo 

chosen so that 1 ||Z1 — A {B{VA^D))Ap < ^{a + 1)A- 

Example 4.4. Suppose that ^(A) = ~ l'/Ah-))+ dt for 

a > 0. Then, liminf $(S') = AA ■ Hence (4.48) holds if k can be chosen so that 

\\S\\f^oo 

\\\D - A{B{VnAD)))\\l < ■ 

Example 4.5. Suppose that $(5') = f^YALi (t ~ \sij\A/fa. Then it is 

not hard to show that liminf <1’(S') = A- Hence (4.48) holds if k can be chosen so 

IISIIf-s-oo 

that i \\D - AiBAnADfml < A- 

5. Numerical experiments. In this section, we conduct numerical experiments 
to show the performances of our algorithm. All experiments are run in Matlab R2014b 
on a 64-bit PC with an Intel(R) Core(TM) i7-4790 CPU (3.60GHz) and 16GB of RAM 
equipped with Windows 8.1 OS. 

5.1. Implementation details. 
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Testing model. We consider the problem of extracting background/foreground 
from a given video under different scenarios. Specifically, we consider: 


inin $(5) + ip-^(L + 5)|l|, 

(5.1) 

s.t. i € n, 


where 12 = {L G | ||L||f,o < 1 , L-i = L .2 = • • • = L.^} and ^ is a linear map. 

This model corresponds to (1.1) with 'I'(T) = Sq{L) and B = C = X. We compare 
the performances of the ADMM with different choices of r, as well as the proximal 
alternating linearized minimization (PALM) proposed in [5], on solving (5.1). For 
ease of future reference, we recall that the PALM for solving (5.1) is given by 


= Va - A*iAiL'^ + S^) - D) 

Ck 


8’^+^ G Argmin $(S') + 


dk 


S-S^ + —A*[A{L^+^ + S'^) - D) 
dk 


where Ck and dk are positive numbers. 

In our experiments, we consider the following three choices of sparse regularizers 

$( 5 ): 

• bridge regularizer: <h(S') = for 0 < p < 1 ; 

• fraction regularizer: $(5) = i+a|%j| « > 0; 

• logistic regularizer: $(S') = Sj=i log(l + <^\sij\) for a > 0 ; 

and two choices of linear map A: 

• A{L + S) := L + S: in this case, model (5.1) can be applied to extracting 
background/foreground from a surveillance video with noise. 

• A{L + S) := H{L + 8) with H G being the matrix representation of 

a regular blurring operator (the blurring is assumed to occur frame-wise): in 
this case, model (5.1) can be applied to extracting background/foreground 
from a blurred and noisy surveillance video. 

Testing videos. We choose four real videos, “Hall”, “Bootstrap”, “Fountain” and 
“ShoppingMall”, from the dataset I2R^ provided by Li et al. [29]. The details of these 
videos are as follows: 

• Hall video contains 200 144 x 176 frames (from airport2001 to airport2200); 

• Bootstrap video contains 200 120 x 160 frames (from bOlSOl to b02000); 

• Fountain video contains 200 128 x 160 frames (from FountainlSOl to Foun- 
tainl500); 

• ShoppingMall video contains 200 256 x 320 frames (from ShoppingMalll501 
to ShoppingMalllTOO). 

We show one frame of each testing video under two different scenarios (noisy and noisy 
blurred), and their ground-truth images of foregrounds in Fig. 5.1. Additionally, all 
pixel values of the testing videos are re-scaled into [ 0 , 1 ] in our numerical experiments. 


^This dataset is available in http://perception.i2r.a-star.edu.sg/bk_model/bk_index.html. The 
authors also provide 20 ground-truth images of foregrounds for each video in this dataset. 









24 


L. YANG, T. K. PONG, X. GHEN 


Hall Bootstrap Fountain ShoppingMall 


noisy 


noisy 

blurred 


ground 

truth 



Fig. 5.1. One frame (from left to right: airport2180, b01842, Fountainl440 and Shopping- 
Malll535) of each testing video under different scenarios (the first two rows) and the ground-truth 
image of foreground of each testing video (the last row). 


Parameters setting. For the ADMM, we use the following heuristics^ to update 
(3: we initialize = 0 and (3 = 0.6/3, where (3 is given in (4.12). In the k-th. iteration, 
we compute 

fnorm^ = WL'^Wf + \\Z'"\\f, 
succ.chg’^ = \\L’^ - L^-^\\f + WZ’^ - Z’^-^Wf- 

Then, we increase Ug by 1 if succ-chg^ > 0.99 • succ-chg^~^. Obviously, Ug is non¬ 
decreasing in this procedure. We then update (3 as 1.1/3 whenever (3 < 1.01/3 and the 
sequence satisfies either Ug > O.Sfe or fnorm^ > 10^°. On the other hand, for PALM, 
we set Ck=dk = 

We initialize our algorithm and the PALM at the point specified in (4.47) with 
K = 1. Moreover, we terminate our ADMM by the following two-stage criterion^: in 
each iteration, we check if 


\\Lk - l’^-^Wf + WZ'^ - Z'^-^Wf 
||L'=||;^ + ||Z^-||;^ + 1 


< To1a,i 


for some Tolyi^ > 0; if it holds, then we further check if 


||A'=-5'=-1|1f + ||A'=-A'=-i|1f 
||5'^1|f + ||A'=||f + 1 


< To1a,2 


^Note from Theorem 4.3(i) that the successive change of each variable goes to zero as k ^ oo. 
Thus, intuitively, it is more favorable to see a decrease in the successive change as k increases. This 
heuristic is designed based on this intuition. 

^We use this two-stage criterion rather than computing the relative errors of all four variables 
(L, S, Z, A) in each iteration of our algorithm because computing matrix Frobenius norms can be 
expensive, especially for large scale problems. This strategy will help reduce the cost per iteration. 
We examine \\L^ — and \\Z^ — in the first stage because these quantities being small 

intuitively implies that ||5^ — and jjA^ — A*^“^||_p are small; see the proof of Theorem 4.3, 

particularly (4.23), (4.24) and the discussions that follow. 
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for some Tol ^_2 > 0. We terminate the algorithm if this latter condition is also 
satisfied. For the PALM, we terminate it when 

l|i'=l|F + ll^'=l|F + l 

for some Tolp > 0. The specific values of Tol^.i, To1a,2 and Tolp are given in the 
following experiments. 

5.2. Comparisons between ADMM with different r and PALM. In this 
subsection, we use the performance profile to evaluate the performances of the ADMM 
with different t and the PALM for extraction under different scenarios. The per¬ 
formance profile is proposed by Dolan and More [14] as a tool for evaluating and 
comparing the performance of a collection of solvers /C on a set of test problems J. 

To describe this method, we assume that we have K solvers and J problems, and 
we use the iteration number as a performance measure. Then, for each problem j and 
solver fc, we set 


lieij k = the iteration number required to solve problem j by solver k. 
and compute the performance ratio 

__ iterj-,fc _ 

min{iterj_fc : fc S /C}' 

The performance profile of iteration numbers is then defined as the distribution func¬ 
tion for the performance ratio, i.e., 

Pkiy) = y tt{j e J" : < v} 

for p > 1. Similarly, the performance profile of function values is obtained by using 
fvalj^fe in place of itevj^k in (5-2), where fvalj^fc denotes the function value at the 
solution given by solver k for solving problem j. Generally speaking, for solver k € 1C, 
the higher Pfe(p) indicates a better performance within the factor p. 

In our experiments, we evaluate the following solvers: the ADMM with r = 0.8, 
the ADMM with r = 1, the ADMM with r = 1.6 and the PALM. 

For A{L + S) = L + S, our test problems are described in Table 5.1, where we 
use the four real videos introduced above as our input data in (5.1), with 3 choices 
of sparse regularizers, 10 choices of /i, and 6 choices of p and a. Thus, we have 
4 solvers and a total of 720 test problems, with 240 test problems for each sparse 
regularizer. Moreover, we set To^q = 10“^, Tol ^,2 = 5 x 10“^ and Tolp = lO”'^. 
Fig. 5.2 shows the performance profiles of iteration numbers and function values for 
different regularizers under this scenario. 

For A{L -I- S') = H{L + S), our test problems are described in Table 5.2, where 
we use 2 choices of p and a. Thus, we have 4 solvers and a total of 240 test problems, 
with 80 test problems for each sparse regularizer. In our experiments, we use the 
method described in [23] to generate the blurring matrix H, which can be represented 
as a Kronecker product H = Hr ® He under the periodic boundary condition. The 
matlab codes^ that generate Hr and He are shown below, where “frame_size” is the 
size of each frame: 


^The codes are available at http://www.imm.dtu.dk/~pcha/HNO/ as a supplement to the book 


[23]. 
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[P, center] = psfGauss(frame_size, 1); 

[Hr, He] = kronDecompCP, center, ’periodic’); 

Moreover, we set Tol^j = 5 x 10“^, Tol ^,2 = 10“^ and Tolp = 3 x 10“^. Fig. 5.3 
shows the performance profiles under this scenario. 

It is not hard to see from Fig. 5.2 and Fig. 5.3 that the performance profiles of 
iteration numbers for the ADMM with r = 0.8 and r = 1 usually lie above those for 
the PALM; and their performance profiles of function values are almost the same. This 
shows that the ADMM with t = 0.8 or r = 1 takes less iterations for solving all the test 
problems while giving comparable function values. For bridge regularizer in the case 
where A{L + S) = L + S (see Fig. 5.2(a)) and in the case where A{L + S) = H{L-\-S) 
(see Fig. 5.3(a)), we can see that the ADMM with r = 0.8 sightly outperforms the 
ADMM with t = 1 in terms of the number of iterations. For other regularizers, their 
performances are comparable. Additionally, for the ADMM with r = 1.6, we can 
see from Fig. 5.2 and Fig. 5.3 that it always terminates with the worst function value, 
although it is always fastest in the case where A{L + 5") = H{L + S) (see Fig. 5.3). 

To better visualize the performance of the algorithms in terms of function values, 
we also plot RelErr^’ := \A{L^,S^) — .FminI/-^min against the number of iterations 
for each algorithm, where denotes the objective value obtained by each 

algorithm at (L^, S^) and J^min denotes the minimum of the objective values obtained 
from all algorithms. We only consider the ADMM with r = 0.8, the ADMM with 
r = 1 and the PALM, and terminate them only after at least 500 iterations and 
the termination criteria are satisfied with Tolyip = 10“^, Toly!i ,2 = 5 x 10“^ and 
Tolp = 10“®. For brevity, we focus on the scenario A{L + S) = L + S and use 
the “Hall” video. The results are presented in Fig. 5.4, from which we can see that 
the ADMM with t = 1 or r = 0.8 performs better than PALM for those particular 
instances. 


Table 5.1 

Problem setting for A{L + S) = L + S 


data 

P 

regularizers 

4 real videos 

5e-l, le-1, 5e-2, le-2, 5e-3 
le-3, 5e-4, le-4, 5e-5, le-5 

bridge: p = 0.2,0.4,0.5,0.6,0.8,1 
fraction/logistic: a = 0.01, 0.1,1, 2, 5,10 


Table 5.2 

Problem setting for A{L + S) = H{L + S) 


data 

P 

regularizers 

4 real videos 

5e-l, le-1, 5e-2, le-2, 5e-3 
le-3, 5e-4, le-4, 5e-5, le-5 

bridge: p = 0.5,1 
fraction/logistic: a = 1,2 


5.3. Simulation Results. In this subsection, we present some simulation re¬ 
sults for the background/foreground extraction problem. In order to evaluate the 
performance in background/foreground extraction, we compare the support of the 
recovered foreground S* with the support of the ground-truth S by computing the 
following measurement: 


precision • recall 

1*-measure := 2 x -r-:--, 

precision -|- recall 











ADMM FOR NONCONVEX AND NONSMOOTH PROBLEMS 


27 


where precision and recall are defined as 


TP 

precision := - 

^ TP + FP 


recall := 


TP 

TP + FN’ 


in which, 

• TP stands for true positives: the number of true foreground pixels that are 
recovered; 

• FP stands for false positives: the number of background pixels that are mis- 
detected as foreground; 

• FN stands for false negatives: the number of true foreground pixels that are 
missed. 


The support of the recovered foreground S* is obtained by thresholding S* entry-wise 
with a threshold value (we use le-3 in our numerical experiments). We would like 
to point out that F-measure varies between 0 and 1 according to the similarity of 
the support of S* and S. The higher the F-measure value, the better the recovery 
accuracy of the support of S. The F-measure approaches the maximum value 1 if 
the supports of S* and S are the same, which means the foreground is recovered 
completely. 

In our experiments below, we choose r = 0.8 for the ADMM. We also use the 
aforementioned four real videos as input with 3 choices of sparse regularizers and 2 
choices of p and a. For each fixed p and a, we experiment with different regularization 
parameters p (5e-l, le-1, 5e-2, le-2, 5e-3, le-3, 5e-4, le-4, 5e-5, le-5) and present only 
the p corresponding to the maximal F-measure.^ 

Extraction from noisy surveillance videos. In this case, A{L + S) = L + S, Amax = 
Amin = 1 and we set To1a,i = 10“"^, ToU ,2 = 5 x 10“^ and Tol_p = 10“"^. The 
computational results are reported in Table 5.3, where we report p and a, the optimal 
p, the number of iterations, the CPU time (seconds) and F-measure. We also show 
the extracted backgrounds and foregrounds given by the ADMM in Fig. 5.5. 

Extraction from noisy and blurred surveillance videos. In this case, A{L + S) = 
H{L + S), Amax = Amax(i?*Af), Amin = Xniin{H*H) and we set ToU,i = 5 X 10“^, 
Tolyi .2 = 10“^ and Tolp = 3 x 10“^. The blurring matrix H is generated by the same 
method introduced in Subsection 5.2. One frame of each corrupted video is shown 
in the second row in Fig. 5.1. We report the computational results in Table 5.4 and 
show the extracted backgrounds and foregrounds by the ADMM in Fig. 5.6. 

Summary. From the results above, it can be seen that the ADMM with r = 
0.8 performs better in the sense that it takes less CPU time for solving most test 
problems while returning comparable F-measures. The performances of our ADMM 
for extraction are also promising from Fig. 5.5 and Fig. 5.6. 

6. Concluding remarks. In this paper, we study a general (possibly nonconvex 
and nonsmooth) model and adapt the ADMM with a general dual step-size r, which 
can be chosen in (0, T±^), to solve it. We establish that any cluster point of the 
sequence generated by our ADMM gives a stationary point under some assumptions; 
we also give simple sufficient conditions for these assumptions. Under an additional 
assumption that a potential function is a Kurdyka-Lojasiewicz function, we can fur¬ 
ther establish the global convergence of the whole sequence generated by our ADMM. 
Our computational results demonstrate the efficiency of our algorithm. 


®If the F-measures are the same, we pick the p, that corresponds to the minimal number of 
iterations. 
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Table 5.3 

Numerical results for extraction from noisy surveillance videos 






ADMM 




PALM 


Data 

regularize! 


iter 

time 

F-measure 


iter 

time 

F-measure 


bri. p 

1.0 

5e-02 

10 

3.21 

0.7562 

5e-02 

19 

3.96 

0.7560 



0.5 

le-02 

32 

11.26 

0.7634 

le-02 

36 

9.29 

0.7624 

Hall 

fra. a 

1.0 

5e-02 

23 

8.26 

0.7578 

5e-02 

33 

8.53 

0.7578 


2.0 

5e-02 

12 

4.17 

0.7368 

5e-02 

15 

3.69 

0.7371 


log. a 

1.0 

5e-02 

12 

21.12 

0.7566 

5e-02 

39 

68.70 

0.7576 



2.0 

5e-02 

12 

16.00 

0.7368 

5e-02 

16 

29.04 

0.7368 


bri. p 

1.0 

le-01 

14 

3.30 

0.8180 

le-01 

19 

3.15 

0.8180 



0.5 

5e-02 

23 

6.77 

0.8206 

5e-02 

22 

4.93 

0.8209 

Bootstrap 

fra. a 

1.0 

le-01 

15 

4.91 

0.8163 

le-01 

20 

5.32 

0.8165 


2.0 

le-01 

14 

4.18 

0.8264 

le-01 

16 

3.72 

0.8261 


log. a 

1.0 

le-01 

16 

21.92 

0.8195 

le-01 

22 

28.62 

0.8195 



2.0 

le-01 

12 

8.91 

0.8363 

le-01 

10 

6.50 

0.8363 


bri. p 

1.0 

le-01 

9 

2.19 

0.7749 

le-01 

7 

1.10 

0.7749 



0.5 

5e-02 

13 

3.54 

0.7000 

5e-02 

11 

2.13 

0.6922 

Fountain 

fra. a 

1.0 

le-01 

9 

2.39 

0.7717 

le-01 

8 

1.63 

0.7717 


2.0 

5e-02 

10 

2.82 

0.7717 

5e-02 

9 

1.87 

0.7717 


log. a 

1.0 

le-01 

9 

13.41 

0.7738 

le-01 

7 

9.65 

0.7738 



2.0 

5e-02 

9 

12.46 

0.7717 

5e-02 

8 

11.51 

0.7717 


bri. p 

1.0 

le-01 

10 

9.66 

0.7046 

le-01 

13 

8.73 

0.7043 



0.5 

le-02 

39 

52.39 

0.7087 

le-02 

79 

83.62 

0.7078 

ShoppingMall 

fra. a 

1.0 

le-01 

12 

14.33 

0.7055 

le-01 

18 

16.95 

0.7055 


2.0 

5e-02 

15 

18.46 

0.7062 

5e-02 

26 

25.34 

0.7064 


log. a 

1.0 

le-01 

11 

66.96 

0.7055 

le-01 

16 

94.06 

0.7055 



2.0 

5e-02 

12 

40.23 

0.7057 

5e-02 

18 

74.83 

0.7057 


Note that our ADMM may not be beneficial when B oi C has no special structure, 
because the corresponding subproblems of ADMM may not have closed-form solutions. 
Nonetheless, as in [31,44,45], it may be possible to add “proximal terms” to simplify 
the subproblems of our ADMM. In addition, in view of the recent work [46], it may 
also be possible to study the convergence of our ADMM for some specially structured 
nonconvex 4'. These are possible future research directions. 
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iter fvai 




(a) bridge regularize! 




(b) fraction regularize! 


iter fval 




(c) logistic regularize! 


Fig. 5.2. Performance profiles of iteration numbers (denoted by ^iter” on the left) and function 
values (denoted by “fval” on the right) for each sparse regularizer with A{L -\- S) = L + S. The 
blown-up subfigures are used to highlight the differences in a specific range of v. 
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iter fvai 




(a) bridge regularize! 

iter fval 




(b) fraction regularize! 

iter fval 




(c) logistic regularize! 


Fig. 5.3. Performance profiles of iteration numbers (denoted by ^iter” on the left) and function 
values (denoted by “fval” on the right) for each sparse regularizer with A{L S) = H{L + S). The 
blown-up subfigures are used to highlight the differences in a specific range of v. 
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p = 1.0,/I = 1e-2 p = 0.2,/i = 1e-4 




(a) bridge regularizer 

a=1,/i=1e-3 o! = 0.1,/i = 1e-2 




(b) fraction regularizer 


a=1,/i=1e-3 a = 0.1,/i = 1e-2 




(c) logistic regularizer 

Fig. 5.4. The RelErr^ vs the number of iterations for each sparse regularizer 














































ADMM FOR NONCONVEX AND NONSMOOTH PROBLEMS 


35 



Fig. 5.5. Extracted backgrounds and foregrounds given by the ADMM for noisy surveillance 
videos. 
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Fig. 5.6. Extracted backgrounds and foregrounds given by the ADMM for noisy and blurred 
surveillance videos. 


































